Metagenome fragment classification based on multiple motif-occurrence profiles
نویسندگان
چکیده
A vast amount of metagenomic data has been obtained by extracting multiple genomes simultaneously from microbial communities, including genomes from uncultivable microbes. By analyzing these metagenomic data, novel microbes are discovered and new microbial functions are elucidated. The first step in analyzing these data is sequenced-read classification into reference genomes from which each read can be derived. The Naïve Bayes Classifier is a method for this classification. To identify the derivation of the reads, this method calculates a score based on the occurrence of a DNA sequence motif in each reference genome. However, large differences in the sizes of the reference genomes can bias the scoring of the reads. This bias might cause erroneous classification and decrease the classification accuracy. To address this issue, we have updated the Naïve Bayes Classifier method using multiple sets of occurrence profiles for each reference genome by normalizing the genome sizes, dividing each genome sequence into a set of subsequences of similar length and generating profiles for each subsequence. This multiple profile strategy improves the accuracy of the results generated by the Naïve Bayes Classifier method for simulated and Sargasso Sea datasets.
منابع مشابه
Erratum to “Unsupervised Two-Way Clustering of Metagenomic Sequences”
and Bahrad Sokhansanj, " Metagenome fragment classification using N-mer frequency profiles, " Advances in Bioinfor-matics, Volume 2008 (2008). "
متن کاملMetagenome Fragment Classification Using N-Mer Frequency Profiles
A vast amount of microbial sequencing data is being generated through large-scale projects in ecology, agriculture, and human health. Efficient high-throughput methods are needed to analyze the mass amounts of metagenomic data, all DNA present in an environmental sample. A major obstacle in metagenomics is the inability to obtain accuracy using technology that yields short reads. We construct t...
متن کاملWEVOTE: Weighted Voting Taxonomic Identification Method of Microbial Sequences
BACKGROUND Metagenome shotgun sequencing presents opportunities to identify organisms that may prevent or promote disease. The analysis of sample diversity is achieved by taxonomic identification of metagenomic reads followed by generating an abundance profile. Numerous tools have been developed based on different design principles. Tools achieving high precision can lack sensitivity in some ap...
متن کاملMotif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification.
BACKGROUND Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. RESULTS Here we present a Phylogenetic Tree-Based Motif Fin...
متن کاملAnalysis of Metagenome Composition by the Method of Random Primers
Metagenome, a mixture of different genomes (as a rule, bacterial), represents a pattern, and the analysis of its composition is, currently, one of the challenging problems of bioinformatics. In the present study, the possibility of evaluating metagenome composition by DNA-marker methods is investigated. These methods are based on using primers, short nucleic acid fragments. Each primer picks ou...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 2 شماره
صفحات -
تاریخ انتشار 2014